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DETAILED ACTION 

1. This Office action for US Patent Application 10/743,722 is in response to 
communications filed 16 November 2007, in reply to the interview of 01 November 
2007. Currently, claims 1-24, 29, and 31-33 are pending. Of those, claim 33 is new. 
Claims 25-28 and 30 have been cancelled. 

2. In the previous Office action, claims 25-28 were rejected under 35 U.S.C. 101 as 
non-statutory, claims 1, 5, 9-13, 18-22, fc and 25-29 were rejected under 35 U.S.C. 102(b) 
as anticipated by "Temporally Adaptive Interpolation Exploiting Temporal Masking in 
Visual Perception" (Lee et al.), claims 30 and 32 were rejected under 35 U.S.C. 103(a) 
as obvious over Lee et al., claims 2, 6-8, and 17 were rejected under 35 U.S.C. 103(a) 
as obvious over Lee et al. in view of "Scene-Context Dependent Reference Frame 
Placement for MPEG Video Coding" (Lan et al.), claims 3, 4, 14, and 23 were rejected 
under 35 U.S.C. 103(a) as obvious over Lee et al. in view of US Patent Application 
Publication 2002/0146071 A1 (Liu et al.), claims 15 and 24 were rejected under 35 
U.S.C. 103(a) as obvious over Lee et al. in view of "MPEG Video Compression 
Standard" (Mitchell), claim 16 was rejected under 35 U.S.C. 103(a) as obvious over Lee 
et al. in view of "Digitale Bildcodierung" (Ohm), and claim 31 was rejected under 35 
U.S.C. 103(a) as obvious over Lee et al. in view of "Video Indexing Using MPEG Motion 
Compensation Vectors" (Ardizzone et al.). Claim 8 was objected to for failing to limit 
parent claim 7, and claim 8 was objected to for a minor informality. 
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Response to Arguments 

3. Applicant's arguments, see Interview summary of 02 November 2007, with 
respect to the rejection(s) of claim(s) 1 and 10 under 35 U.S.C. 102(b) have been fully 
considered and are persuasive. Therefore, the rejection has been withdrawn. 
However, upon further consideration, a new ground(s) of rejection is made in view of US 
Patent Application Publication 2003/0142748 A1 (Tourapis et al.). 

Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

5. Claims 1, 5, 10-13, 18-22, 29, and 32-33 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over "Temporally Adaptive Interpolation Exploiting Temporal 
Masking in Visual Perception" (Lee et al.), in view of US Patent Application Publication 
2003/0142748 A1 (Tourapis et al.). Lee et al. teaches a method for dynamically 
determining a Group of Picture (GOP) structure in a video based on temporal 
segmentation. Regarding claim 1, in one embodiment of Lee et al., temporal 
segmentation is determined from a motion compensation error determination (pg. 519: 
column 1), which must inherently use motion vectors to determine a predicted image to 
be compared with an actual image. Lee et al. also incorporates a "typical motion 
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compensation encoder" (pg. 514: column 2), which includes a motion estimation unit. 
Then, Lee et al. discloses "computing motion vectors for a plurality of pictures". 

As previously mentioned, in Lee et al., motion compensation error is used to 
determine temporal segmentation. If the error between an actual frame and a predicted 
frame becomes too great, then it is determined that there is little consistency between 
frames, but if there is a small error, then temporally adjacent frames are considered to 
exhibit consistency. This information is used in a detector that finds a scene 
segmentation point, which is a point at which small changes in a single scene have 
accumulated past a certain threshold away from a reference frame. The frame 
immediately preceding the scene segmentation point becomes a P frame, and the 
frames in between the last reference frame and the scene segmentation point are 
encoded as B frames (pg. 515: columns 1-2). Then, Lee et al. teaches assigning 
pictures as B pictures based on a consistency measure. However, as discussed in the 
interview of , November 15, determining motion compensation error per se is not 
considered the same as determining consistent motion speed. 

Tourapis et al. teaches a video coder that encodes inter macroblocks using 
various modes. In one mode, a "Direct prediction mode", a current macroblock in a B 
picture may be calculated from previously-decoded motion information (paragraph 
0067). Then, the motion for the current picture is just re-used from the previous picture, 
instead of being re-coded and re-transmitted. When motion speed is determined to be 
constant, the motion for the current macroblock is directly taken from the corresponding 
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macroblock in a reference frame (paragraph 0068). This determination is known as 
Motion Projection. 

Lee et al. discloses the claimed invention except for determining a picture mode 
from a calculation of consistent motion speed. Tourapis et al. teaches that it was known 
to determine motion compensation mode as a result of a motion projection calculation. 
Therefore, it would have been obvious to one having ordinary skill in the art to 
determine a picture mode from motion projection, as taught by Tourapis et aL, since 
Tourapis et al. states in paragraph 0118 that such a modification would enable a direct 
mode coding of blocks in B pictures, further exploiting temporal redundancy with a 
current picture and reference pictures. 

Regarding claim 5, the method of Lee et al. could be adjusted to insert 1-3 
default P frames in a GOP to avoid encoding delay (pg. 516, column 2 - pg. 517, 
column 1). For a 16-frame GOP, if 1 P-frame is inserted, for example, no more than 8 
B-frames could be inserted consecutively. Even if no P-frames are inserted by default 
in a GOP, the number of consecutive B-frames is limited by the GOP size of 15 or 16 
frames, since a GOP starts with an l-frame. 

Regarding claims 10-13 and 33, in Lee et al., two kinds of segmentation are 
determined, corresponding with the claimed "termination condition". The first type of 
termination is the determination of a P picture, reached when an accumulated error in 
pictures goes past a certain threshold. This corresponds with a failure in the motion 
projection of Tourapis et aL, in which case it is determined that a Direct Mode coding is 
inappropriate. When the threshold is reached, the frame immediately preceding the 
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segmentation point becomes a P frame, and the frames in between the last reference 
frame and the scene segmentation point are encoded as B frames (pg. 515: columns 1- 
2). Another segmentation detector determines an abrupt scene change, and encodes 
an I frame at the start of a new scene and a P frame at the end of the previous scene 
(pg. 515, column 1). 

Regarding claim 18, figure 1 of Lee et al. shows a Temporally Adaptive Motion 
Interpolation (TAMI) encoder. This encoder includes a buffer, a conventional MPEG 
encoder, a motion estimation unit, a scene segmentation point (SSP) detector, and a 
GOP Structure unit (pg. 514, column 2 - pg. 515, column 1). If this GOP Structure Unit 
performs the Motion Projection calculation of Tourapis et al., it corresponds with the 
claimed "colinearity detector"! Regarding claim 19, the TAMI unit determines the 
positions of P and B pictures in a GOP (page 514, column 2). Regarding claim 20, as 
mentioned previously, motion projection may be determined from the colinearity of 
motion vectors. Regarding claim 21, the Abrupt Scene Change (ASC) detector 
determines a scene change in an encoded video. Regarding claim 22, as mentioned 
above, at a scene change, an old scene ends with a P-frame and a new scene starts 
with an l-frame. 

Regarding claim 29, in Tourapis et al., figure 6 illustrates a direct mode P picture 
at time t+2, in which the motion vector (dx, dy) for the corresponding block A at time t+1 
is extended for current block B. This corresponds with the claimed iterative method. 
Regarding claim 32, in Tourapis et al., direct mode blocks have directly temporally 
scaled motion vectors (paragraphs 0118-0119). 
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6. Claims 2, 6-8, and 17 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Lee et al. in view of Tourapis et al., as applied to claims 1 and 10 above, in view of 
"Scene-Context Dependent Reference Frame Placement for MPEG Video Coding" (Lan 
et al.), cited in the Information Disclosure Statement filed 12 May 2004. Claim 2 of the 
present application recites encoding the first frame with a variance in motion speed as a 
P-frame. However, in Lee et al., the first frame with a motion inconsistency above a 
certain threshold is encoded as an l-frame, and the frame immediately previous to this 
point is encoded as a P-frame (pg. 515, column 2). 

Lan et al. teaches a picture-type assignment algorithm in which if the difference 
in accumulated motion between a current frame and a reference frame is above a 
certain value, the current frame is encoded as a P-frame, and becomes the next 
reference frame (pg. 481, column 2). 

Lee et al., in combination with Tourapis et al., discloses the claimed invention 
except for encoding the first frame that does not follow a frame trend as a P-frame. Lan 
et al. teaches that it was known to encode a significantly changed frame as a P-frame. 
Therefore, it would have been obvious for one having ordinary skill in the art at the time 
the invention was made to encode reference frames as P-frames rather than l-frames 
as taught by Lan et al., since it was well-known in the art that P-frames require less bits 
to be encoded than l-frames. 

Additionally, claims 6 and 17 recite coding some pictures as I pictures for a 
random-access policy. Lee et al. and Tourapis et al. do not teach this limitation. Lan et 
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al. teaches an MPEG coding method in which frame type assignment is varied. 
Regarding claims 6 and 17, Lan et al. discloses forcing I frames into a coded video 
sequence every 15 frames to facilitate random access (pg. 486, column 1). Regarding 
claim 7, in Lan et al., whenever an l-frame is encoded, the previous frame is encoded 
as a P-frame (pg. 481, column 1). Regarding claim 8, in Lee et al., P frames can be 
encoded as P1 frames which are regular MPEG P frames, or as P2 frames, which have 
the same bit allocation as MPEG B frames and are thus coarsely quantized (pg. 514, 
column 2). 

Lee et al., in combination with Tourapis et al., discloses the claimed invention 
except for forcing l-frame encoding. Lan et al. teaches that it was known to encode I- 
frames at regular intervals. Therefore, it would have been obvious to one having 
ordinary skill in the art at the time the invention was made to modify the coding method 
of Lee et al. to insert periodic I frames as taught by Lan et al., since Lan et al. states in 
page 486, column 1 that such a modification would enable random search and pause 
features at playback time. 

7. Claims 3, 4, 14, and 23 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Lee et al. in view of Tourapis et al. as applied to claims 1,12, and 21, 
in view of US Patent Application Publication 2002/0146071 A1 (Liu et al). Lee et al. 
teaches scene change detection, but always encodes the first picture after the scene 
change as an l-frame and the last picture before the scene change as a P-frame. 
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Liu et al. teaches a scene change detection component in a video encoder. In 
Liu et al., a scene change is normally encoded as an l-frame. However, this is not 
always the most efficient coding method. Figure 10 shows a scene change between 
frame 1001 and frame 1002. Frame 1001 was originally scheduled to be encoded as , 
an l-frame, but since a scene change immediately follows, much computational effort 
would be wasted in calculating high-quality images immediately after the scene change. 
Then, frame 1001 is instead encoded as a P-frame, and frames 1002 and 1048 are 
encoded as low-quality predictive frames, since human vision is insensitive to quality 
changes near a scene change (paragraph [0079]). Figure 11 gives a further example. 
Here, a scene change occurs immediately preceding P-frame 1102. Frame 1104, two 
frames before the scene change, was originally scheduled as an l-frame, but instead 
the l-frame is delayed until frame 1110, for which motion vectors have not yet been 
calculated (paragraph [0080]). Finally, figure 13 shows a scene change immediately 
preceding P-frame 1302, which was originally scheduled as an l-frame. However, since 
motion vectors 1304 and 1306 to frame 1302 have already been calculated, the l-frame 
is delayed until frame 1308, originally scheduled to be the next P-frame (paragraph 
[0082]). 

Lee et al., in combination with Tourapis et al., teaches the claimed invention 
except for encoding P-frames immediately surrounding scene changes. Liu et al. 
teaches that it was known to encode a frame immediately preceding or immediately 
following a scene change as a P-frame. Therefore, it would have been obvious to one 
having ordinary skill in the art at the time the invention was made to encode frames 
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adjacent to scene changes as P-frames as taught by Liu et al., since Liu et al. states in 
paragraph [0079] that such a modification would increase encoding efficiency by not 
encoding irrelevant data near a scene change, at which time the human eye cannot 
clearly distinguish details of an image. 

8. Claims 15 and 24 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Lee et al., in view of Tourapis et al., as applied to claims 10 and 21 above, and in 
further view of "MPEG Video Compression Standard" (Mitchell), cited in the Information 
Disclosure Statement of 17 July 2006. Although in Lee et al., a default picture is 
encoded as a B-frame, Lee et al. does not explicitly state that pictures adjacent to scene 
changes are B-frames. However, Mitchell states that since the eye is insensitive to 
image content near scene changes, image quality can be sacrificed. One method of 
reducing image quality is to start a new scene with B pictures (footnote 13). 

Lee et al., in combination with Tourapis et al., discloses the claimed invention 
except for encoding B-frames adjacent to a scene change. Mitchell teaches that it was 
known to encode B-frames immediately following a scene change. Therefore, it would 
have been obvious for one having ordinary skill in the art at the time the invention was 
made to force B-frames immediately following a scene change, as taught by Mitchell, 
since Mitchell states in page 79 that such a modification would reduce the bit rate 
needed to encode a scene change. 
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9. Claim 16 is rejected under 35 U.S.C. 103(a) as being unpatentable over Lee et 
al. in view of Tourapis et al. as applied to claim 12 above, and in further view of "Digitale 
Bildcodierung" (Ohm), cited in the Information Disclosure Statement of 17 July 2006. 
Lee et al. teaches scene change detection based on a low correlation between two 
images (pg. 515, column 1), but does not disclose the exact method used. Ohm 
teaches the Normalized Cross-Correlation Function (NCCF), shown as equation 5.52. 
NCCF is used in many pattern-matching applications, such as motion estimation (pg. 1). 
Two images, x a {m a ,n a ), arid yj{m a ,n a ), are compared over pixels (m a ,n a ) in area A. 

This corresponds with images x n (i 9 j) and x n+[ (i 9 j) in area (M, N) in the present 

invention. Two pictures have the highest match when the NCCF is at a maximum (pg. 
3), and correspondingly, two pictures have a low match, indicative of a scene change, 
when the value of NCCF is low. 

Lee et al., in combination with Tourapis et al., discloses the claimed invention 
except for the exact method used to determine correlation of two images. Ohm teaches 
that it was known to determine how closely two images match each other with 
Normalized Cross-Correlation. Therefore, it would have been obvious to one having 
ordinary skill in the art at the time the invention was made to determine the correlation 
of two images using NCCF, as taught by Ohm, since Ohm states in page 4 that such a 
modification would allow for a more accurate comparison of the similarity of two images 
rather than by difference levels alone. 
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10. Claim 31 is rejected under 35 U.S.C. 103(a) as being unpatentable over Lee et 
al. in view of Tourapis et al., as applied to claim 29 above, and in further view of "Video 
Indexing Using MPEG Motion Compensation Vectors" (Ardizzone et al.) 
Conventionally, a motion vector for a block is defined as the displacement of the block 
between two pictures, velocity is defined as displacement over time, and speed is 
defined as the magnitude of velocity. However, while two-dimensional displacement is 
normally given with the Euclidian distance metric, the square root of the sum of the 
squares of the x and y components, in claim 31, displacement is given with the 
Manhattan distance metric, the sum of the x and y components. Ardizzone et al. 
teaches a method for spatially segmenting an MPEG image with motion vectors (pg. 
725, columns 1-2). In one step of Ardizzone et al., magnitudes of the motion vectors 
are built into a histogram to determine "dominant" regions of the image (pg. 727, column 
2). If a motion vector has a large magnitude, this means that its macroblock is 
displaced a large distance, and so has a high speed. An experiment was performed to 
determine how best to retrieve related images to a given image, by matching motion 
vector characteristics (pg. 728, column 2 - pg. 729, column 1). Using a Manhattan 
distance metric yielded the best result (pg. 729, column 1). 

Lee et al. discloses the claimed invention except for defining pixel block 
displacement with a Manhattan distance metric. Ardizzone et al. teaches that it was 
known to calculate motion vector magnitude with Manhattan distance. Therefore, it 
would have been obvious to one having ordinary skill in the art at the time the invention 
was made to determine motion speed of an image based on the Manhattan distance 
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metric, as taught by Ardizzone et al., since Ardizzone et al. states in page 729, column 
1, that such a modification would produce the greatest accuracy in characterizing the 
motion vectors of the image. 



Conclusion 

11. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. US Patent 5,428, ?96 A (Yagasaki et al.) teaches that "when 
there is a strong correlation among the pictures and an object moves in a straight path 
at a substantially constant speed", motion vectors in the pictures scale linearly. US 
Patent 5,745,182 A (Yukitake et al.) teaches a motion compensation determination 
system for interlaced images in which the previous odd and even fields are both used 
as reference images for a current field. US Patent 6,380,986 B1 (Minami et al.) teaches 
a "telescopic" motion vector determination system in which the center of a motion vector 
search for a current image is centered on the projection of the motion vector of the 
corresponding block in the previous image. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to David N. Werner whose telephone number is (571) 272- 
9662. The examiner can normally be reached on Monday-Friday from 8:30 AM - 5:00 
PM. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Mehrdad Dastouri can be reached on (571) 272-7418. The fax phone 
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number for the organization where this application or proceeding is assigned is 571- 
273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 




DNW 



MEHRDAD DASTOURI 
SUPERVISORY PATENT EXAMINER 



